The present invention relates to a method and an apparatus for speech synthesis utilizing a rule-based synthesis method, and a storage medium storing computer-readable programs for realizing the speech synthesizing method.
As a method of controlling a phoneme duration, a conventional rule-based speech synthesizing apparatus employs a control-rule method determined based on statistics related to a phoneme duration (Yoshinori SAGISAKA, Youichi TOUKURA, xe2x80x9cPhoneme Duration Control for Rule-Based Speech Synthesis,xe2x80x9d The Journal of the Institute of Electronics and Communication Engineers of Japan, vol. J67-A, No. 7 (1984) pp 629-636), or a method of employing Categorical Multiple Regression as a technique of multiple regression analysis (Tetsuya SAKAYORI, Shoichi SASAKI, Hiroo KITAGAWA, xe2x80x9cProsodies Control Using Categorical Multiple Regression for Rule-Based Synthesis,xe2x80x9d xe2x80x9cReport of the 1986 Autumn Meeting of the Acoustic Society of Japan,xe2x80x9d 3-4-17 (1986-10)).
However, according to the above conventional technique, it is difficult to specify the speech production time of a phoneme string. For instance, in the control-rule method, it is difficult to determine a control rule that corresponds to a specified speech-production time. Moreover, if input data includes an exception in the control rule method, or if a satisfactory estimation value is not obtained in the method of Categorical Multiple Regression, it becomes difficult to obtain a phoneme duration that sounds natural.
In a case of controlling a phoneme duration by using control rules, it is necessary to weigh the statistics (average value, standard deviation and so on) while taking into consideration of the combination of preceding and succeeding phonemes, or it is necessary to set an expansion coefficient. There are various factors to be manipulated, e.g., a combination of phonemes depending on each case, parameters such as weighting and expansion coefficients and the like. Moreover, the operation method (control rules) must be determined by rule of thumb. Therefore, in a case where a speech-production time of a phoneme string is specified, the number of combinations of phonemes become extremely large. Furthermore, it is difficult to determine control rules applicable to any combination of phonemes in which a total phoneme duration is close to the specified speech-production time.
The present invention is made in consideration of the above situation, and has as its object to provide a speech synthesizing method and apparatus as well as a storage medium, which enables setting the phoneme duration for a phoneme string so as to achieve a specified speech-production time, and which can provide a natural phoneme duration regardless of the length of speech production time.
In order to attain the above object, the speech synthesizing apparatus according to an embodiment of the present invention has the following configuration. More specifically, the speech synthesizing apparatus for performing speech synthesis according to an inputted phoneme string comprises: storage means for storing statistical data related to a phoneme duration of each phoneme; determining means for determining speech production time of a phoneme string in a predetermined section; setting means for setting the phoneme duration corresponding to the speech-production time of each phoneme constructing the phoneme string, based on the statistical data of each phoneme obtained from the storage means; and generating means for generating a speech waveform by connecting phonemes using the phoneme duration.
Furthermore, the present invention provides a speech synthesizing method executed by the above speech synthesizing apparatus. Moreover, the present invention provides a storage medium storing control programs for having a computer realize the above speech synthesizing method.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.